Reference Model for Disease Progression

ABSTRACT

A method wherein reference disease models predict progression of disease within given populations, utilizing publically available clinical data and risk equations, to give a birds-eye view of clinical trials by allowing multiple trials to be systematically compared simultaneously via parallel processing/High Performance Computing which allows competition among alternative equations/hypothesis combinations; cross validation; and, then ranks results according to fitness via a fitness engine.

This application claims priority over, and incorporates by reference inits entirety, U.S. Provisional Application 61/806,365 filed on Mar. 28,2013.

FIELD OF USE

The present invention relates to a computer model in which diseaseprogression is calculated within given populations.

BACKGROUND OF THE INVENTION

Prior approaches to disease progression analysis use complex proprietarydata, while the present invention utilizes publically available data anddoes not require access to any proprietary data.

-   -   U.S. patent application Ser. No. 11/503,393 (David Eddy, et al.)        discloses a method for simulating a clinical trial includes:        selecting a trial procedure for a simulated trial corresponding        to the clinical trial; generating a population of subjects for        the simulated trial; searching the population of subjects to        determine acceptable subjects for the simulated trial; selecting        subjects for the simulated trial from the acceptable subjects;        simulating the trial procedure for the selected subjects; and        collecting trial data for the simulated trial from the simulated        trial procedure.    -   U.S. patent application Ser. No. 12/788,242 (David Eddy, et.        al.) discloses a method of determining a quality of care        provided by a healthcare provider to individuals in a population        is provided. A data processing apparatus that has one or more        processors is disclosed. Data representing biomarkers for        individuals in a population is received. Baseline and present        risks are determined. Risk reduction values are determined.        Based on the current risk reduction, a quality score is        determined. A scale is created, and the quality score is mapped        to the scale. The global quality score of the disclosure        provides numerous benefits over past performance measures.    -   U.S. Pat. No. 8,224,665 (Macdonald Morris) discloses a method        and apparatus for predicting a health benefit for an individual        is provided. Outcomes from a first simulation on a set of        simulated individuals reflecting a population are stored and        used to determine a first risk function and corresponding cost        values. Outcomes from a second simulation on a set of simulated        individuals reflecting having a healthcare intervention are        stored and used to determine a second risk function reflecting        the intervention and corresponding cost values of the        intervention. A benefit function is derived from the difference        of the first and second risk functions. A cost function that        describes the cost of the intervention is derived from the        respective cost values. The derived benefit function and cost        function are used to predict the corresponding benefit and cost        of the healthcare intervention for a given individual.        Individuals can be ranked by degree of expected benefit.

The Cardiff Model discloses a method to evaluate the impact of newtherapies in a population of T2DM patients, modeling disease progressionthrough the implementation of the UK Prospective Diabetes Study (UKPDS)68 outcomes equation with the model requiring specification of: age,sex, ethnicity, smoking status and duration of diabetes and modelchanges to the following modifiable risk factors: total cholesterol, HDLcholesterol, systolic blood pressure, weight and glycosylated hemoglobin(HbA1c). While the time-dependent risk factor profiles are simulatedthrough implementation of equations reported in the UKPDS 68 study,pre-specified HbA1c threshold values may be used to invoke escalation tosecond- and third-line therapies with costs applied to all predictedcomplications in the year of occurrence. Healthcare maintenance costsare applied in all subsequent years following non-fatal events with thecosts of diabetes-related complications being drawn primarily from UKPDS65 while baseline utility is modeled using age-dependent mean EQ-5Dvalues in subjects, obtained from the Health Survey for England 2003,with no major complications. Utility decrements associated withpredicted complications are drawn primarily from UKPDS 62 with modeloutput including: micro-vascular: retinopathy, neuropathy, nephropathy;and macro-vascular complications: congestive heart failure, myocardialinfarction, stroke, ischaemic heart disease; hypoglycaemia,diabetes-specific mortality, all-cause mortality and point estimates,and probabilistic output for cost-effectiveness.

The CDC-RTI Diabetes Cost-Effectiveness Model discloses a method ofdisease progression and cost-effectiveness for type 2 diabetes,following patients from diagnosis to either death or 95 years of age.The model simulates development of diabetes related complications onthree micro-vascular disease paths (nephropathy, neuropathy, andretinopathy) and two macro-vascular disease paths for diabetes screeningand pre-diabetes with model outcomes including: disease complications,deaths, costs, and quality-adjusted life years. In the model,progression between disease states is governed by transitionprobabilities that depend on risk factors—including glycemic level(measured by HbA1c levels), blood pressure, cholesterol, and smokingstatus—and the duration of diabetes. Interventions affect the transitionprobabilities and resulting complications. For example, tight glycemiccontrol lowers HbA1c, slowing progression on the micro-vascularcomplication paths. With slower progression, fewer micro-vascularcomplications occur, resulting in death being delayed, QALYs increase,with the resulting cost of complications reduced. The model has beenused to estimate the cost-effectiveness of treatment interventions forpatients with diagnosed diabetes while evaluating optimal resourceallocation across interventions; assess whether screening for diabetesis cost-effective; show that lifestyle modification is cost-effective indelaying or preventing diabetes among persons with pre-diabetes; andestimate the cost-effectiveness of screening for pre-diabetes.

The Diabetes and Analysis Modeling Framework model uses establishedmethods to develop the central simulation engine (CSE) that lies at thenucleus of DMAF. The architecture of DMAF has been designed so emergingevidence reported in the literature can be efficiently incorporated intothe framework and evaluated for potential impact on immediate and longterm outcomes. DMAF captures events occurring in routine patient carethrough an A1c sub model, bridging between patient-specific Al c, andthe incidence of complications while multiplicative factors are takenfrom Al c vs. time curves from published head-to-head studies of thetreatments considered. DMAF also contains treatment transition andscheduling based, by default, the treatment consensus algorithmpublished by Nathan et al. The transitions between treatment strata aremodifiable for sensitivity analysis including the functionality torandomly sample a range of start times for additional treatment.

Disease models predict disease progression within a population. Yet,predictions differ among models and populations while models becomeoutdated and do not account for improvement in treatment and newermedical advances.

Modeling treatment improvement on top of existing models is highlybeneficial, while, to a lesser extent, including biomarker change isalso beneficial; including both improvements together, i.e. treatmentimprovement and biomarker change, improves models in many cases.

The Reference Model is currently based on secondary data published inclinical trials, with published risk equations, while no individual datais necessary. Yet it is possible to use individual data and nonpublished risk equations with the model. The use of public data themodel uses does not limit it.

SUMMARY OF THE INVENTION

A primary purpose of The Reference Model for disease progression is tofacilitate a model systematic performance comparison over multiplepopulations and produce fitness information while also allowing testinghypothesis, for example two hypotheses can be tested: Medical treatmentimproved through time beyond model prediction, and how the biomarkerchange improves model predictions.

While disease models predict disease progression within a population,predications differ among models and populations, and because modelsbecome outdated, they do not account for improvement in treatment andnewer medical advances. Because of this, they should periodically beupdated or include a temporal correction term for treatment improvement.

The Reference Model is built from publically available data, using MIST(Micro-Simulation Tool)—a Python based modeling framework which isavailable under Genaral Public License (GPL), and does not requireaccess to proprietary, or indivitual patient, information. The softwareuses Monte Carlo simulations that are executed in parallel, and in thedata shown in this application, the system can run on a single machine,on a cluster of machines, and on a cluster in the cloud. With the modelutilizing computer power/techinques, such as parallel processing/highperformance computing, cross validation, competition among alternativeequations and/or hypotheses combinations, and ranks results based onfitness via a fitness engine multiple studies can be compared and largeamounts of study data, which was unaccessable before, is now availableto determine disease progression over large populations comprising largedata sets.

While further areas of applicability will become apparent from thedescription provided herein, the following description, examples anddrawings are shown by way of example, and in no way limit the scope ofthe present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a block diagram of The Reference Model.

FIG. 2 depicts a block diagram of The Reference Model and how itfunctions.

FIG. 3 depicts the parameters used in risk equations which have beenpublished previously which are fed into The Reference Model.

FIG. 4 depicts the correction term.

FIG. 5 depicts the results of The Reference Model, with the results withand without biomarker hypotheses being depicted.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1, is an example of a multi-process state transition model thatforms a base for The Reference Model. In this particular example, theprocesses are: process CHD (coronary heart disease), process stroke, andprocess competing mortality. In the CHD process, the population data isentered to the appropriate state, during simulation, a healthyindividual first goes through the no CHD state, next depending on therisk equation chosen and Monte Carlo random factors, the individual mayflow into the MI (Myocardial Infarction) event state that filtersindividuals with those who died and survived are further separated. Thesame process occurs for the other disease states, in this particularcase, stroke and competing mortality, which is those who died fromsomething other than two processes being compared, which in this caseare deaths from heart disease and stroke.

FIG. 2, shows an outline of how The Reference Model works. Multiplepopulations enter the model depicted in FIG. 1, then different riskequations are applied to different transitions to modify model behavior.In this example 4 equations are used for MI and 4 equations are used forstroke. Once this data has been analyzed for all individuals in allpopulations and for all risk equations, the fitness matrix is thenproduced, which depicts difference of the simulation from observedoutcomes.

FIG. 3 is a depiction of the biomarkers/parameters used in equationswhich represent observed phenomena and can be combined with thehypothesis and can be used in The Reference Model. In this example, itis easy to see that risk equations are different and therefore whenapplied to different cases will produce different results. This showsthe motivation and the need for model comparison.

FIG. 4 is a depiction of the correction term which accounts for modeloutdate and in the particular parameters of the study used in thisexample of The Reference Model. In this case, the parameters areadjusted for time past between; model year=average of model data timeinterval, and simulated time stamp=simulated study year/s.

FIG. 5 is the results of The Reference Model with the results splitbetween results which utilized the biomarker and did not utilize thebiomarker hypothesis. Other data depicted are: the fitness score matrix,the rank populations for each model, rank models for each population,the overall rank of models, and a legend that matches fitness to color.

The Reference Model implements a modeling approach that runs multiplemodels over multiple populations to determine fitness of differentequations/hypotheses to different populations;

16=4×4 different published equation combinations were tested, 4 forMyocardial Infarction (MI) and 4 for stroke. These 16 equationcombinations were combined with hypotheses regarding treatmentimprovement and biomarker change to create 64 different modelvariations. Those 64 model variations were tested against 22 differentclinical trial cohorts from diabetic populations. Theequations/hypotheses combinations were ranked according to fitness. Insummary, there were 64=4×4×2×2 equation/hypothesis variations, 22cohorts with known outcomes from 4 diabetic populations: UKPDS, ASPEN,ADVANCE, ACCORD, Monte Carlo simulation included 10 repetitions of 1000individuals for 10 simulation years, overall there were 14080processes=64×22×10. These results were obtained, in this case, using asingle 8 core desktop computer operating for 4 days.

The treatment improvement hypothesis was deduced from and was defined asa constant yearly improvement in the probability of MI, Stroke, FatalMI, and Fatal Stroke. It was assumed the improvement is the same forevery year and for every one of the probabilities. The biomarker changehypothesis used the end of study biomarker values in the equationsstarting after one year of simulation, whenever those numbers werepublished.

The fitness score matrix uses color coding and ranking to visuallydemonstrate the fitness between 4/22 populations/cohorts and 64combinations of published risk equations and hypotheses. The resultsshow that different combinations of risk equations behave differently ondifferent population cohorts. For each query, the system ranks themodels. Models that implement the following two corrections generallybehaved better: Temporal correction for treatment improvement; Biomarkerchange introduced in the first year.

The results suggest that including the treatment improvement hypothesisis beneficial for most models. Out of 32 model variations, 29 modelvariations that included treatment improvement component performedbetter than their counterpart without treatment improvement whenaccounting for all 16 population cohorts. When considering highresolution results of model per specific population, out of 32×16=512model and population combinations, there were only 103 instances (20.1%)where the treatment improvement hypothesis worsened the fitness to thepublished results of the clinical trials.

The results suggest that including the biomarker change hypothesis isbeneficial for most models. Out of 32 model variations that included thebiomarker change hypothesis 28 had superior results to the modelswithout this hypothesis, when considering all populations. The 4 modelvariations in which biomarker change did not improve results includedthe treatment improvement hypothesis.

Current published models do not include components to account for futureimprovement, which is reasonable since past behavior does not guaranteefuture behavior. Never the less, this causes models to become quicklyoutdated. Adding a correction for treatment improvement, as depicted inFIG. 4, will keep models up to date and improves their performance.Adding such a component requires calibration and validation. TheReference Model is a good tool for calibration and validation usingmultiple models and populations.

The Reference Model uses systematic cross validation of models againstpopulations using micro-simulation and relying on computing power;Defining a fitness score to convert multiple outcome differences into asingle number; Defining different queries with weights to rankmodel/population fitness. The methods avoid using restricted individualdata and rely on more accessible summary data. Never the less, it isstill possible to use proprietary information such as individual dataand proprietary risk equations.

The Reference Model is built from publically available data, using MIST(Micro-Simulation Tool)—a Python based modeling framework which isavailable under General Public License (GPL), and does not requireaccess to proprietary, or individual patient, information. The softwareuses Monte Carlo simulations that are executed in parallel, and in thedata shown in this application, the system can run on a single machine,on a cluster of machines, and on a cluster in the cloud. With the modelutilizing computer power/techniques, such as parallel processing/highperformance computing, cross validation, competition among alternativeequations and/or hypotheses combinations, and ranks results based onfitness via a fitness engine multiple studies can be compared and largeamounts of study data, which was inaccessible before, is now availableto determine disease progression over large populations comprising largedata sets.

More specific details regarding the use of the technique:

-   -   A) The Reference Model helps select a suitable Model to specific        Population;

When predicting information for a new study with unknown results it isnot known what is the best model to use to predict it. However, ifbaseline characteristics of a target populations are known it ispossible to find the populations that resemble that population baselineto deduce the best model.

For example, a clinical trial for a young population of 20 year olds isbeing considered. The Reference Model is consulted and all itspopulations are searched to find the closest populations in itscharacteristics. Without loss of generality let us assume that there arepopulations of 25, 40, 65 year old in the system with results. In thesimplest case the model that proved best fitting to the young 25 yearold population will be chosen to predict disease progression in theyoung 20 year old target population. This simple case can be extended bydefining a population distance function. Once such function is definedby the user the distances between base populations are known. Thispopulation distance function can be used as a factor when weightingfitness results from multiple populations and studies. This way, theranking of the best model for the new target population will be based oninformation from more than a single population and be less sensitive tooutliers.

-   -   B) The Reference Model allows investigation of different        elements in a model/risk equations to figure which are more        important;

The Reference Model provides a fitness score for each model built frommultiple risk equations. By averaging these fitness scores for allmodels associated with a specific risk equation it is possible to derivea fitness score for each risk equation.

It is possible to extend this concept beyond a score for a specific riskequation within a model. It is possible to drill down to the level ofspecific element within a risk equation.

Different risk equations are built from different elements, includingrisk factors such as Age or Blood Pressure, numeric coefficients thatspecify the magnitudes of these risk factors, and mathematical elementsuch as addition, subtraction, power, log, exp etc.

Once the score of a risk equation is known it is possible to calculatethe score for each element in an equation to figure out what element ismost important. For example it would be possible to find out the fitnessscore of Age or Blood pressure by averaging the scores of the riskequations where they participate. This is one way to derive importanceof an element in the equation according to the calculated fitnessscores. Again, the elements can then be ranked to find out which is mostimportant.

It is also possible to calculate the fitness score for a factor in therisk equation by using the coefficient of the factor as a weight in aweighted average. This way a factor with relatively larger coefficientsin a certain equation be more influenced by the score of this equation.To make sure all equations use the same coefficients, a first orderTaylor series expansion can be used, or the user can define thoseweights manually.

In summary, the idea of the fitness score and associated ranking can beassociated with a model, a population, an equation within a model, and afactor in an equation, or with other features associated with the modelor population.

-   -   C) Details regarding Defining the Fitness Function;

The fitness score function that allows comparing different models andpopulations is user defined and the user can use their own function todefine fitness. However some function elements are useful to constructthe fitness function and these can be mixed and matched to create usefulfitness functions. These elements comprise:

-   -   C1) The norm of difference between model and observed study        results is perhaps the simplest fitness function.    -   C2) Dividing the norm by the square root of number of outcomes        in the study allows comparing results from multiple studies with        different number of outcomes. For example it allows comparing        fitness of a study that only observes death to fitness of a        study that observes 2 outcomes: death from stroke and stroke.        Without this correction, studies with more outcomes will be less        fit just because they have more outcomes. This correction makes        studies with a difference of 1 individual in all outcomes        equivalent.    -   C3) Weighing differences between model and observed study        results allow the user to emphasize the importance of each        specific outcome with a weight. For example it allows the user        do emphasize the weight of a stroke or the weight of a death        from stroke to be more important than that of Death from other        causes.    -   C4) Providing default study results if undefined by the study.        If a study does not measure a result in a population the system        can provide a default number. This default number can be based        on other studies, previous statistics, or human assessment.

Combining these elements together in various combinations can createdifferent fitness functions that can provide a different views onfitness.

The Reference Model provides a birds eye view of clinical trials,allowing the results from multiple trials to be systematically compared,while at the same time, not requiring the use of proprietary data andallowing the accumalation of knowledge for competition.

The Reference Model is a MultiScale Model that allows combininginformation from different scales: 1) Individual patient level 2) Studysummary information 3) Multi study level; while providing a birds eyeview at the Multi-study level. Combining multiplepopulations/models/hypothesis/outcomes, the Reference Model integrates amap that shows our current understanding of phenomena observed instudies.

The multi scale model allows using only available summary information toderive results and once individual data become available it is possibleto increase resolution and use individual data and compare results tothe summary data.

The Reference Model allows querying the results of multiple phenomenonin multiple trials using a user defined metric in a unified fashion.This query can be defined visually as a table that is easier to graspthan a textual querying language such as SQL, with the table including:query keys for grouping, study information, and weights for averaging.

Bounds can be used to figure out fitness ranges if information isunknown. For example, if correlation between population paramteres suchas Blood Pressure and Cholesterol level are unknown, it is possible todefine bound populations with no correlation and a popualtion with fullycorrelated popualtion to deduce the fitness range between those boundingpopulations.

Throughout this application, various Patents and Applications arereferenced by number and inventor. The disclosures of these documents intheir entireties are hereby incorporated by reference into thisspecification in order to more fully describe the state of the art towhich this invention pertains.

The foregoing description of the embodiment has been provided forpurposes of illustration and description, and is not intended to beexhaustive or to limit the disclosure. Individual limits or features ofa particular embodiment are generally not limited to that particular;but, where applicable, are interchangeable and can be used in a selectedembodiment, even if not specifically shown or described. The same mayalso be varied in many ways while such variations are not to be regardedas a departure from the disclosure, and all such modifications areintended to be included within the scope of the disclosure.

I claim:
 1. A method for calculating disease progression againstmultiple populations utilizing existing studies, risk models andhypotheses; Wherein, multiple disease simulation results are queriedagainst multiple populations with multiple outcomes. Wherein, eithersummary data, or detailed individual data, or both can be used.
 2. Themethod of claim 1, utilizing said method to obtain a fitness score whichcombines the results of multiple outcomes from clinical trials anddisease model results into one number.
 3. The method of claim 1, whereinthe visualization of query results model/population fitness using colorand ranking.
 4. The method of claim 1, wherein the query is entered invisual tabular format using a Graphical User Interface.
 5. The method ofclaim 1, wherein fitness is calculated between trial and disease modelresults.
 6. The method of claim 1, wherein fitness is calculated betweenelements within the model/population.
 7. The method of claim 1, whereindifferent fitness functions can be used and may include defaults andweights.
 8. The method of claim 1, wherein a correction term, in use, toaccount for time to prevent model outdate.
 9. A method whereinutilization of queries and visualization is used to improve diseasemodels by comparing hypotheses; Whereby, by finding similarity amongpopulations, the best model for a new population via the most fittingmodel to the closest populations considering the similarity is deduced;Whereby, hypothesis bounds can be used to figure out fitness ranges. 10.The method of claim 9, wherein a correction term, in use, to account fortime to prevent model outdate.
 11. The method of claim 9, utilizing saidmethod to obtain a fitness score which combines the results of multipleoutcomes from clinical trials and disease model results into one number12. The method of claim 9, wherein the visualization of query resultsmodel/population fitness using color and ranking.
 13. The method ofclaim 9, wherein the query is entered in visual tabular format using aGraphical User Interface.
 14. The method of claim 9, wherein fitness iscalculated between trial and disease model results.
 15. The method ofclaim 9, wherein fitness is calculated between elements within themodel/population.
 16. The method of claim 9, wherein different fitnessfunctions can be used and may include defaults and weights.
 17. Themethod of claim 9, wherein a correction term, in use, to account fortime to prevent model outdate.